Balancing Techniques for Counterfactual Expalanations of Student Success Prediction Models

Jakub Kuzilek, Mustafa Cavus

Introduction

Implementierung von KI-basiertem Feedback und Assessment mit Trusted Learning Analytics in Hochschulen

Introduction

Research context



Introduction

Student success prediction


Introduction

Research questions
  1. What is most appropriate method for generating counterfactual explanations after balancing?

  2. How do balancing techniques affect counterfactual explanations of student success prediction models?

Why?
Missing evaluation can lead to inefficient explanations, compromising trustworthiness

Data

Data



Data

Methods

Machine Learning

What is Machine Learning?

Methods

Machine Learning

“Field of study that gives computers the ability to learn without being explicitly programmed”
~ Arthur Samuel, 1959

  • ML is subfield of Computer Science
  • Objective: Generalize from experience
  • Approach: Use data to train a model

Methods

Supervised Learning

What is Supervised Learning?

Methods

Supervised Learning
  • We know the right answers
  • Inferring decision function from labelled training data
  • Generalize to unseen data “reasonably”

Methods

Classification

What is Classification?

Methods

Classification
  • Predict categorical labels
  • Types
    • Binary
    • Multi-class
    • Multi-label

Methods

Classification models
  • Logistic Regression
  • Naive Bayes
  • k-Nearest Neighbors
  • Support Vector Machines
  • Decision Tree
  • Random Forest
  • Neural Networks

Methods

Imblanced data

What is imbalanced data?

Methods

Imblanced data
  • Class distribution is not uniform
  • One class is significantly more frequent than the other
  • Example: 90% of students pass, 10% fail
  • Problem: Classifier may be biased towards majority class

Methods

Balancing techniques
  • Oversampling
  • Undersampling
  • Synthetic Minority Oversampling Technique (SMOTE)

Methods

explainable AI

What is explainable AI?

Methods

explainable AI
  • AI systems are often black boxes
  • Lack of transparency
  • Objective: Make AI systems more interpretable
  • Why?: Trustworthiness, accountability, and ethical considerations
  • Methods: Feature importance, LIME, SHAP, counterfactual explanations

Methods

Counterfactual explanations
  • Definition: A CE method emphasize the necessary alterations to the input data to change model output
  • Methods: WhatIf, MOC, NICE

Methods

Counterfactual explanations
  • WhatIf: Generate CEs closest to the original instance based on Gower distance
  • Multiobjective Counterfactual Explanations (MOC): Optimize multiple objectives simultaneously
  • Nearest instance counterfactual explanations (NICE): Finds the nearest instance using Heterogeneous Euclidean Overlap Method and then optimize

Methods

CEs Evaluation
  • Sparsity - number of variables changed
  • Minimality - scale of variable value changes
  • Validity - smalles difference between the factual and counterfactual instance
  • Proximity - variations in factual and counterfactual features
  • Plausibility - ensures the data distribution

Methods

Experiment workflow

Results

Results

Conclusion

  • MOC and WhatIf were performing lowest
  • NICE was performing best regardless the balancing technique
  • balancing strategies impacts the performance of CE methods
  • notable in case of underperforming approaches

Thank you